Running Data Step from Python

The datastep action set in CAS allows you to run data step code with the datastep.runcode action. There are a few ways to execute data step code in the Python client. We'll cover each of them here.

Let's get a CAS connection to work with first.


In [1]:
import swat

conn = swat.CAS(host, port, username, password)

Now we need to get some data into our session.


In [2]:
cls = conn.read_csv('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/class.csv',
                    casout=dict(name='class', caslib='casuser'))
cls


Out[2]:
CASTable('CLASS', caslib='CASUSER(kesmit)')

The datastep.runcode Action

The most basic was to run data step code is using the datastep.runcode action directly. This action runs very much like running data step in SAS. You simply specify CAS tables rather than SAS data sets as your input and output data. In this example, we will comput the body mass index (BMI) of the students in the class data set. The output of the datastep.runcode action will contain two keys: inputTables and outputTables. Each of those keys points to a DataFrame of the information about the input and output tables including a CASTable object in the last column.


In [3]:
out = conn.datastep.runcode('''
   data bmi(caslib='casuser');
      set class(caslib='casuser');
      BMI = weight / (height**2) * 703;
   run;
''')
out


Out[3]:
§ InputCasTables
casLib Name Rows Columns casTable
0 CASUSER(kesmit) class 19 5 CASTable('class', caslib='CASUSER(kesmit)')

§ OutputCasTables
casLib Name Rows Columns casTable
0 CASUSER(kesmit) bmi 19 6 CASTable('bmi', caslib='CASUSER(kesmit)')

elapsed 0.173s · user 0.548s · sys 0.526s · mem 286MB

We can pull the output table DataFrame out using the following line of code. The ix property is a DataFrame property that allows you to extract elements from a DataFrame at indexes or labels. In this case, we want the element in row zero, column name casTable.


In [4]:
bmi = out.OutputCasTables.ix[0, 'casTable']
bmi.to_frame()


Out[4]:
Selected Rows from Table BMI
Name Sex Age Height Weight BMI
0 Alfred M 14.0 69.0 112.5 16.611531
1 Henry M 14.0 63.5 102.5 17.870296
2 Jeffrey M 13.0 62.5 84.0 15.117312
3 Louise F 12.0 56.3 77.0 17.077695
4 Ronald M 15.0 67.0 133.0 20.828470
5 Alice F 13.0 56.5 84.0 18.498551
6 James M 12.0 57.3 83.0 17.771504
7 John M 12.0 59.0 99.5 20.094369
8 Mary F 15.0 66.5 112.0 17.804511
9 Thomas M 11.0 57.5 85.0 18.073346
10 Barbara F 13.0 65.3 98.0 16.156788
11 Jane F 12.0 59.8 84.5 16.611531
12 Joyce F 11.0 51.3 50.5 13.490001
13 Philip M 16.0 72.0 150.0 20.341435
14 William M 15.0 66.5 112.0 17.804511
15 Carol F 14.0 62.8 102.5 18.270898
16 Janet F 15.0 62.5 112.5 20.246400
17 Judy F 14.0 64.3 90.0 15.302976
18 Robert M 12.0 64.8 128.0 21.429660

As you can see, we have a new CAS table that now includes the BMI column.

The CASTable datastep Method

CASTable objects have a datastep method that does some of the work of wrapping your data step code with the appropriate input and output data sets. When using this method, you just give the body of the data step code. The output table name will be automatically generated. In this case, the output of the method is a CASTable object that references the newly generated table, so you don't have to extract the CASTable from the underlying action results.


In [5]:
bmi2 = cls.datastep('''BMI = weight / (height**2) * 703''')
bmi2.to_frame()


Out[5]:
Selected Rows from Table DA059B7E75934BF49D3FDC29AEA6322C
Name Sex Age Height Weight BMI
0 Alfred M 4.624071e+18 69.0 112.5 16.611531
1 Henry M 4.624071e+18 63.5 102.5 17.870296
2 Jeffrey M 4.623508e+18 62.5 84.0 15.117312
3 Louise F 4.622945e+18 56.3 77.0 17.077695
4 Ronald M 4.624634e+18 67.0 133.0 20.828470
5 Alice F 4.623508e+18 56.5 84.0 18.498551
6 James M 4.622945e+18 57.3 83.0 17.771504
7 John M 4.622945e+18 59.0 99.5 20.094369
8 Mary F 4.624634e+18 66.5 112.0 17.804511
9 Thomas M 4.622382e+18 57.5 85.0 18.073346
10 Barbara F 4.623508e+18 65.3 98.0 16.156788
11 Jane F 4.622945e+18 59.8 84.5 16.611531
12 Joyce F 4.622382e+18 51.3 50.5 13.490001
13 Philip M 4.625197e+18 72.0 150.0 20.341435
14 William M 4.624634e+18 66.5 112.0 17.804511
15 Carol F 4.624071e+18 62.8 102.5 18.270898
16 Janet F 4.624634e+18 62.5 112.5 20.246400
17 Judy F 4.624071e+18 64.3 90.0 15.302976
18 Robert M 4.622945e+18 64.8 128.0 21.429660

The casds IPython Magic Command

The third way of running data step from Python is reserved for IPython users. IPython has commands that are called "magics". These commands start with % (for one line commands) or %% (for cell commands) and allow extension developers to add functionality that isn't necessarily Python-based to your environment. Included in SWAT is a packgae called swat.cas.magics that can be loaded to surface the %%casds magic command. The %%casds magic gives you the ability to enter an entire IPython cell of data step code rather than Python code. This is especially useful in the IPython notebook interface.

Let's give the %%casds magic a try. First we have to load the swat.cas.magics extension.


In [6]:
%load_ext swat.cas.magics

Now we can use the %%casds magic to enter an entire cell of data step code. The %casds magic requires at least one argument which contains the CAS connection object where the action should run. In most cases, you'll want to add the --output option as well which specifies the name of an output variable that will be surfaced to the Python environment which contains the output of the datastep.runcode action.


In [7]:
%%casds --output out2 conn

data bmi3(caslib='casuser');
   set class(caslib='casuser');
   BMI = weight / (height**2) * 703;
run;


Out[7]:
§ InputCasTables
casLib Name Rows Columns casTable
0 CASUSER(kesmit) class 19 5 CASTable('class', caslib='CASUSER(kesmit)')

§ OutputCasTables
casLib Name Rows Columns casTable
0 CASUSER(kesmit) bmi3 19 6 CASTable('bmi3', caslib='CASUSER(kesmit)')

elapsed 0.163s · user 0.522s · sys 0.459s · mem 287MB

Just as before, we can extract the output CASTable object from the returned DataFrames.


In [8]:
bmi3 = out2.OutputCasTables.ix[0, 'casTable']
bmi3.to_frame()


Out[8]:
Selected Rows from Table BMI3
Name Sex Age Height Weight BMI
0 Alfred M 14.0 69.0 112.5 16.611531
1 Henry M 14.0 63.5 102.5 17.870296
2 Jeffrey M 13.0 62.5 84.0 15.117312
3 Louise F 12.0 56.3 77.0 17.077695
4 Ronald M 15.0 67.0 133.0 20.828470
5 Alice F 13.0 56.5 84.0 18.498551
6 James M 12.0 57.3 83.0 17.771504
7 John M 12.0 59.0 99.5 20.094369
8 Mary F 15.0 66.5 112.0 17.804511
9 Thomas M 11.0 57.5 85.0 18.073346
10 Barbara F 13.0 65.3 98.0 16.156788
11 Jane F 12.0 59.8 84.5 16.611531
12 Joyce F 11.0 51.3 50.5 13.490001
13 Philip M 16.0 72.0 150.0 20.341435
14 William M 15.0 66.5 112.0 17.804511
15 Carol F 14.0 62.8 102.5 18.270898
16 Janet F 15.0 62.5 112.5 20.246400
17 Judy F 14.0 64.3 90.0 15.302976
18 Robert M 12.0 64.8 128.0 21.429660

Conclusion

If you are an existing SAS user, you may be relieved to find that you can still use data step in the CAS environment. Even better, you can run it from Python. This blend of languages and environments gives you an enormous number of possibilities for data analysis, and should make SAS programmers feel right at home in Python.


In [9]:
conn.close()

In [ ]: